cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs

نویسندگان

چکیده

Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 high overhead on its generation step, which consists several time-consuming operations, including large-scale matrix-vector multiplication (MUL), number-theoretic transform (NTT), multi-scalar (MSM). Therefore, this paper presents cuZK, an efficient GPU implementation with the following three techniques to achieve performance. First, we propose new parallel MSM algorithm. This algorithm achieves nearly perfect linear speedup over Pippenger algorithm, well-known serial Second, parallelize MUL operation. Along our self-designed scheme well-studied NTT scheme, cuZK parallelization all operations step. Third, reduces latency caused by CPU-GPU data transfer 1) reducing redundant 2) overlapping device computation. The evaluation results show that module provides 2.08x (up 2.94x) versus state-of-the-art implementation. 2.65x 4.86x) standard benchmarks 2.18× GPU-accelerated cryptocurrency application, Filecoin.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned ...

متن کامل

Faster Implementation of Scalar Multiplication on Koblitz Curves

We design a state-of-the-art software implementation of field and elliptic curve arithmetic in standard Koblitz curves at the 128-bit security level. Field arithmetic is carefully crafted by using the best formulae and implementation strategies available, and the increasingly common native support to binary field arithmetic in modern desktop computing platforms. The i-th power of the Frobenius ...

متن کامل

A Faster Parallel Algorithm for Matrix Multiplication on a Mesh Array

Matrix multiplication is a fundamental mathematical operation that has numerous applications across most scientific fields. Cannon’s distributed algorithm to multiply two n-by-n matrices on a two dimensional square mesh array with n cells takes exactly 3n−2 communication steps to complete. We show that it is possible to perform matrix multiplication in just 1.5n − 1 communication steps on a two...

متن کامل

Accelerating Radiosity on GPUs

We propose a novel approach to implement radiosity on GPU with specific optimizations via form-factor matrix transformations. The proposed transformations enable to reduce the amount of computations for multiple-bounce global illumination and apply DXT compression (with subsequent hardware decompression when reading formfactors on GPU). Our implementation is 10 times faster running and requires...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IACR transactions on cryptographic hardware and embedded systems

سال: 2023

ISSN: ['2569-2925']

DOI: https://doi.org/10.46586/tches.v2023.i3.194-220